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AMENDMENTS TO THE CLAIMS: 

1 . (Currently Amended); A method for the segmentation of an audio stream into 
semantic or syntactic units wherein the audio stream is provided in a digitized format, 
comprising the steps of: 

determining a fundamental frequency for the digitized audio stream; 

detecting changes of the fundamental frequency in the audio stream, wherein detecting 
the changes of the fundamental frequency includes providing a threshold value for 
estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value , and wherein the voicedness of the fundamental frequency estimates lower than the 
threshold value equals no voice, and wherein the voicedness of the fundamental 
frequency estimates higher than the threshold value equals voice ; 

determining candidate boundaries for the semantic or syntactic units depending on the 
detected changes of the fundamental frequency; 

extracting and combining a plurality of prosodic features in [[the]] a neighborhood of the 
candidate boundaries; and 

determining boundaries for the semantic or syntactic units depending only on the 
combined plurality of prosodic features. 

2. (Canceled) 

3. (Previously Presented): The method according to claim 1, wherein defining an 
index function for the fundamental frequency having a value = 0 if the voicedness of the 
fundamental frequency is lower than the threshold value and having a value = 1 if the 
voicedness of the fundamental frequency is higher than the threshold value. 
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4. (Previously Presented): The method according to claim 3, wherein extracting the 
plurality of prosodic features is in an environment of the audio stream where the value of 
the index function is = 0. 

5. (Original): The method according to claim 4, wherein the environment is a time 
period between 500 and 4000 milliseconds. 

6. (Previously Presented): The method according to claim 1, wherein at least one 
prosodic feature is represented by the fundamental frequency. 

7. (Canceled) 

8. (Original): The method according to claim 1, further comprising first detecting 
speech and non-speech segments in the digitized audio stream and performing the steps 
of claim 1 thereafter only for detected speech segments. 

9. (Original): The method according to claim 8, wherein the detecting of speech and 
non-speech segments comprises utilizing the signal energy or signal energy changes, 
respectively, in the audio stream. 

10. (Original): The method according to claim 1, further comprising the step of 
performing a prosodic feature classification based on a predetermined classification tree. 

1 1 . (Currently Amended): An article of manufacture comprising a computer usable 
medium having computer readable program code means embodied therein for causing 
segmentation of an audio stream into semantic or syntactic units, wherein the audio 
stream is provided in a digitized format, the computer readable program code means in 
the article of manufacture comprising computer readable program code means for 
causing a computer to effect: 

determining a fundamental frequency for the digitized audio stream; 
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detecting changes of the fundamental frequency in the audio stream, wherein detecting 
the changes of the fundamental frequency includes providing a threshold value for 
estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value , and wherein the voicedness of the fundamental frequency estimates lower than the 
threshold value equals no voice, and wherein the voicedness of the fundamental 
frequency estimates higher than the threshold value equals voice ; 

determining candidate boundaries for the semantic or syntactic units depending on the 
detected changes of the fundamental frequency; 

extracting and combining a plurality of prosodic features in [[the]] a neighborhood of the 
candidate boundaries; and 

determining boundaries for the semantic or syntactic units depending only on the 
combined plurality of prosodic features. 

12. (Currently Amended): A digital audio processing system for segmentation of a 
digitized audio stream into semantic or syntactic units comprising: 

means for determining a fundamental frequency for the digitized audio stream, 

means for detecting changes of the fundamental frequency in the audio stream, wherein 
detecting the changes of the fundamental frequency includes providing a threshold value 
for estimates of the fundamental frequency's voicedness and determining whether the 
voicedness of the fundamental frequency estimates are higher or lower than the threshold 
value , and wherein the voicedness of the fundamental frequency estimates lower than the 
threshold value equals no voice, and wherein the voicedness of the fundamental 
frequency estimates higher than the threshold value equals voice . 
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means for determining candidate boundaries for the semantic or syntactic units depending 
on the detected changes of the fundamental frequency; 

means for extracting and combining a plurality of prosodic features in [[the]] a 
neighborhood of the candidate boundaries; and 

means for determining boundaries for the semantic or syntactic units depending only on 
the combined plurality of prosodic features. 

13. (Original): An audio processing system according to claim 12, further comprising 
means for generating an index function for the voicedness of the fundamental frequency 
having a value = 0 if the voicedness of the fundamental frequency is lower than a 
predetermined threshold value and having a value = 1 if the voicedness fundamental 
frequency is higher than the threshold value. 

14. (Original): Audio processing system according to claim 12 or 13, further 
comprising means for detecting speech and non-speech segments in the digitized audio 
stream, particularly for detecting and analyzing the signal energy or signal energy 
changes, respectively, in the audio stream. 
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